Search CORE

26 research outputs found

Near-optimal bounds for phase synchronization

Author: Boumal Nicolas
Zhong Yiqiao
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 20/03/2017
Field of study

The problem of phase synchronization is to estimate the phases (angles) of a complex unit-modulus vector

z

from their noisy pairwise relative measurements

C = zz^* + \sigma W

, where

W

is a complex-valued Gaussian random matrix. The maximum likelihood estimator (MLE) is a solution to a unit-modulus constrained quadratic programming problem, which is nonconvex. Existing works have proposed polynomial-time algorithms such as a semidefinite relaxation (SDP) approach or the generalized power method (GPM) to solve it. Numerical experiments suggest both of these methods succeed with high probability for

\sigma

up to

\tilde{\mathcal{O}}(n^{1/2})

, yet, existing analyses only confirm this observation for

\sigma

up to

\mathcal{O}(n^{1/4})

. In this paper, we bridge the gap, by proving SDP is tight for

\sigma = \mathcal{O}(\sqrt{n /\log n})

, and GPM converges to the global optimum under the same regime. Moreover, we establish a linear convergence rate for GPM, and derive a tighter

\ell_\infty

bound for the MLE. A novel technique we develop in this paper is to track (theoretically)

n

closely related sequences of iterates, in addition to the sequence of iterates GPM actually produces. As a by-product, we obtain an

\ell_\infty

perturbation bound for leading eigenvectors. Our result also confirms intuitions that use techniques from statistical mechanics.Comment: 34 pages, 1 figur

arXiv.org e-Print Archive

Princeton University Open Access Repository

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Author: Montanari Andrea
Zhong Yiqiao
Publication venue
Publication date: 24/07/2020
Field of study

Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not induce overfitting. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic feature vectors in

d

dimensions, and

N

hidden neurons. Under the assumption

N \le Cd

(for

C

a constant), we show that the network can exactly interpolate the data as soon as the number of parameters is significantly larger than the number of samples:

Nd\gg n

. Under these assumptions, we show that the empirical NT kernel has minimum eigenvalue bounded away from zero, and characterize the generalization error of min-

\ell_2

norm interpolants, when the target function is linear. In particular, we show that the network approximately performs ridge regression in the raw features, with a strictly positive `self-induced' regularization.Comment: 69 pages, 4 figure

arXiv.org e-Print Archive

Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output

Author: Huang Junliang
Jin Chi
Wang Liwei
Wang Ziteng
Zhong Yiqiao
Publication venue
Publication date: 06/01/2014
Field of study

We consider accurately answering smooth queries while preserving differential privacy. A query is said to be

K

-smooth if it is specified by a function defined on

[-1,1]^d

whose partial derivatives up to order

K

are all bounded. We develop an

\epsilon

-differentially private mechanism for the class of

K

-smooth queries. The major advantage of the algorithm is that it outputs a synthetic database. In real applications, a synthetic database output is appealing. Our mechanism achieves an accuracy of

O (n^{-\frac{K}{2d+K}}/\epsilon )

, and runs in polynomial time. We also generalize the mechanism to preserve

(\epsilon, \delta)

-differential privacy with slightly improved accuracy. Extensive experiments on benchmark datasets demonstrate that the mechanisms have good accuracy and are efficient

arXiv.org e-Print Archive

CiteSeerX

Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage

Author: Gui Yu
Ma Cong
Zhong Yiqiao
Publication venue
Publication date: 05/06/2023
Field of study

We investigate the role of projection heads, also known as projectors, within the encoder-projector framework (e.g., SimCLR) used in contrastive learning. We aim to demystify the observed phenomenon where representations learned before projectors outperform those learned after -- measured using the downstream linear classification accuracy, even when the projectors themselves are linear. In this paper, we make two significant contributions towards this aim. Firstly, through empirical and theoretical analysis, we identify two crucial effects -- expansion and shrinkage -- induced by the contrastive loss on the projectors. In essence, contrastive loss either expands or shrinks the signal direction in the representations learned by an encoder, depending on factors such as the augmentation strength, the temperature used in contrastive loss, etc. Secondly, drawing inspiration from the expansion and shrinkage phenomenon, we propose a family of linear transformations to accurately model the projector's behavior. This enables us to precisely characterize the downstream linear classification accuracy in the high-dimensional asymptotic limit. Our findings reveal that linear projectors operating in the shrinkage (or expansion) regime hinder (or improve) the downstream classification accuracy. This provides the first theoretical explanation as to why (linear) projectors impact the downstream performance of learned representations. Our theoretical findings are further corroborated by extensive experiments on both synthetic data and real image data

arXiv.org e-Print Archive

Tractability from overparametrization: The example of the negative perceptron

Author: Montanari Andrea
Zhong Yiqiao
Zhou Kangjie
Publication venue
Publication date: 11/12/2021
Field of study

In the negative perceptron problem we are given

n

data points

({\boldsymbol x}_i,y_i)

, where

{\boldsymbol x}_i

is a

d

-dimensional vector and

y_i\in\{+1,-1\}

is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible \emph{negative} margin. In other words, we want to find a unit norm vector

{\boldsymbol \theta}

that maximizes

\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle

. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which

n,d\to \infty

with

n/d\to\delta

, and prove upper and lower bounds on the maximum margin

\kappa_{\text{s}}(\delta)

or -- equivalently -- on its inverse function

\delta_{\text{s}}(\kappa)

. In other words,

\delta_{\text{s}}(\kappa)

is the overparametrization threshold: for

n/d\le \delta_{\text{s}}(\kappa)-\varepsilon

a classifier achieving vanishing training error exists with high probability, while for

n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon

it does not. Our bounds on

\delta_{\text{s}}(\kappa)

match to the leading order as

\kappa\to -\infty

. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold

\delta_{\text{lin}}(\kappa)

. We observe a gap between the interpolation threshold

\delta_{\text{s}}(\kappa)

and the linear programming threshold

\delta_{\text{lin}}(\kappa)

, raising the question of the behavior of other algorithms.Comment: 88 pages; 7 pdf figure

arXiv.org e-Print Archive